Automatic Paragraph Identification: A Study across Languages and Domains

نویسندگان

  • Caroline Sporleder
  • Mirella Lapata
چکیده

In this paper we investigate whether paragraphs can be identified automatically in different languages and domains. We propose a machine learning approach which exploits textual and discourse cues and we assess how well humans perform on this task. Our best models achieve an accuracy that is significantly higher than the best baseline and, for most data sets, comes to within 6% of human performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Comparative Effect of Using Idioms in Conversation and Paragraph Writing on EFL Learners’ Idiom Learning

This study investigated the comparative effect of teaching idiomatic expressions through practicing them in conversation and paragraph writing on intermediate EFL learners’ idiom learning. The participants were sorted out of a population of 134 intermediate students in Zabansara Language School in Khorramabad based on their scores on a Preliminary English Test (PET) and an idiom test piloted in...

متن کامل

Comprehension across Application Domains and Languages

This work demonstrates that our natural language understanding framework can be applied across application domains and languages with ease. Approaches towards language understanding generally involve much handcrafting, e.g. in writing grammars or annotating corpora, hence portability is a desirable trait in the development of language understanding systems. Our framework for natural language un...

متن کامل

Language Complexity, Accuracy and Fluency in Different Types of Writing Paragraph: Do the Raters Notice Such Effect

The aim of the present study was to investigate the effects of two types of paragraph on EFL learners’ written production. It addressed the issue of how three aspects of language production (i.e. complexity, accuracy, and fluency) vary among two types of paragraphs (i.e. paragraphs of chronology and cause-effect) written by EFL learners. Thirty intermediate level learners of English participate...

متن کامل

Acquiring entailment pairs across languages and domains: A Data Analysis

Entailment pairs are sentence pairs of a premise and a hypothesis, where the premise textually entails the hypothesis. Such sentence pairs are important for the development of Textual Entailment systems. In this paper, we take a closer look at a prominent strategy for their automatic acquisition from newspaper corpora, pairing first sentences of articles with their titles. We propose a simple l...

متن کامل

Kohonen Self Organizing for Automatic Identification of Cartographic Objects

Automatic identification and localization of cartographic objects in aerial and satellite images have gained increasing attention in recent years in digital photogrammetry and remote sensing. Although the automatic extraction of man made objects in essence is still an unresolved issue, the man made objects can be extracted from aerial photos and satellite images. Recently, the high-resolution s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004